Goto

Collaborating Authors

 new voice


Google smart speakers are starting to sound like Gemini

PCWorld

A smattering of Google Home users are reporting that their Nest speakers are--when asked the right voice command--chatting with a new voice, a sign that the promised Gemini makeover for Google Assistant is starting to roll out. In a video posted on Reddit, a Google Nest Mini user asked "Hey Google, what's up," and got an unusually loquacious reply in a new voice: "What's happening right now is that we're on a giant rock moving through space at 1,000 miles an hour and orbiting a giant star made up mostly of hydrogen. Also, we're chatting, which I enjoy." When the Nest user asked a more basic follow-up question about the weather, Google Assistant answered in its regular voice with a typical weather report. According to 9to5Google, you can tell if the Gemini-enhanced Assistant has made its way to your Nest speakers by asking, "Hey Google, what's up?"


OpenAI released its advanced voice mode to more people. Here's how to get it.

MIT Technology Review

The update also adds new voices. Shortly after the launch of GPT-4o, OpenAI was criticized for the similarity between the female voice in its demo videos, named Sky, and that of Scarlett Johansson, who played an AI love interest in the movie Her. OpenAI then removed the voice. Now it has launched five new voices, named Arbor, Maple, Sol, Spruce, and Vale, which will be available in both the standard and advanced voice modes. MIT Technology Review has not heard them yet, but OpenAI says they were made using professional voice actors from around the world.


Scarlett Johansson Says OpenAI Ripped Off Her Voice for ChatGPT

WIRED

Last week OpenAI revealed a new conversational interface for ChatGPT with an expressive synthetic voice strikingly similar to that of the AI assistant played by Scarlett Johansson in the sci-fi movie Her--only to suddenly disable the new voice over the weekend. On Monday, Johansson issued a statement claiming to have forced that reversal, after her lawyers demanded OpenAI clarify how the new voice was created. Johansson's statement, relayed to WIRED by her publicist, claims that OpenAI CEO Sam Altman asked her last September to provide ChatGPT's new voice but that she declined. She describes being astounded to see the company demo a new voice for ChatGPT last week that sounded like her anyway. "When I heard the release demo I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference," the statement reads.


Creating New Voices using Normalizing Flows

Bilinski, Piotr, Merritt, Thomas, Ezzerg, Abdelhamid, Pokora, Kamil, Cygert, Sebastian, Yanagisawa, Kayoko, Barra-Chicote, Roberto, Korzekwa, Daniel

arXiv.org Artificial Intelligence

Creating realistic and natural-sounding synthetic speech remains a big challenge for voice identities unseen during training. As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities. Firstly, we create an approach for TTS and VC, and then we comprehensively evaluate our methods and baselines in terms of intelligibility, naturalness, speaker similarity, and ability to create new voices. We use both objective and subjective metrics to benchmark our techniques on 2 evaluation tasks: zero-shot and new voice speech synthesis. The goal of the former task is to measure the precision of the conversion to an unseen voice. The goal of the latter is to measure the ability to create new voices. Extensive evaluations demonstrate that the proposed approach systematically allows to obtain state-of-the-art performance in zero-shot speech synthesis and creates various new voices, unobserved in the training set. We consider this work to be the first attempt to synthesize new voices based on mel-spectrograms and normalizing flows, along with a comprehensive analysis and comparison of the TTS and VC modes.


Ever wanted to hear Donald Trump speaking Hindi? Try the AI tool that can clone anyone's voice

Daily Mail - Science & tech

He has one of the most instantly recognisable voices in Britain, but have you ever wondered what David Attenborough would sound like speaking German? Well, now you can find out, thanks to a new AI tool that can clone anyone's voice and make them say anything in multiple languages. The tool, by ElevenLabs, requires just a few seconds of audio, and even maintains the speaker's original tone of voice. Creators hope this will'expand the horizons' in numerous fields including publishing, game development and the media. You can try it yourself on ElevenLabs' website using your own voice or that of your favourite celebrity!


New voice cloning AI lets "you" speak multiple languages

#artificialintelligence

This article is an installment of Future Explored, a weekly guide to world-changing technology. You can get stories like this one straight to your inbox every Thursday morning by subscribing here. In January, Microsoft unveiled an AI that can clone a speaker's voice after hearing them talk for just three seconds. While this system, VALL-E, was far from the first voice cloning AI, its accuracy and need for such a small audio sample set a new bar for the tech. Microsoft has now raised that bar again with an update called "VALL-E X," which can clone a voice from a short sample (4 to 10 seconds) and then use it to synthesize speech in a different language, all while preserving the original speaker's voice, emotion, and tone.


Happy International Women's Day!

AIHub

To celebrate International Women's Day, we take a look back over the past year and highlight some of the women we've interviewed, written about, chatted to, and featured on AIhub. Rose Nakasi is a Lecturer of Computer Science and a Research Scientist at the Makerere Artificial Intelligence Lab, in Makerere University, Uganda. She holds a PhD in Computer Science from Makerere University. Her research interests are in artificial intelligence and data science, and particularly in the use of these for developing improved automated tools and techniques for microscopy diagnosis of diseases like malaria in low-resourced but highly endemic settings. We spoke to Rose Nakasi about her work developing machine learning techniques to aid diagnosis of microscopically diagnosed diseases: Interview with Rose Nakasi: using machine learning and smartphones to help diagnose malaria.


AIhub monthly digest: January 2023 – low-resource language projects, Earth's nightlights and a Lanfrica milestone

AIHub

Welcome to our January 2023 monthly digest, where you can catch up with any AIhub stories you may have missed, get the low-down on recent events, and much more. This month, we highlight some of the projects pertaining to low-resource languages, hear about counterfactual explanations for land cover mapping, and find out about machine learning techniques for night-time remote sensing. We are delighted to share the second article in our focus series on "AI around the world": Natural Language Processing for low-resource languages. This time we enter the domain of natural language processing and highlight some of the work and initiatives being carried out on low-resource languages. In our latest episode of New voices in AI, Srija Chakraborty tells us about her work applying machine learning techniques to night-time remote sensing for measuring nightlights from a variety of natural and artificial sources.


This Voice Doesn't Exist - Generative Voice AI

#artificialintelligence

Recently it seems everybody is talking about generative AI. Deep learning-powered large language and text-to-image models like ChatGPT, Stable Diffusion, DALL-E and Midjourney have caused much fuss in the tech world, and beyond. Many include them among the most significant recent developments in AI. Whether or not you agree, the general sentiment seems to be that something very all-powerful has appeared. In 2023 we'll hear about models that can help you draw or create videos.


A year of new voices in AI

AIHub

There were nine interviews over 2022. What started as a pile of loose post-it notes ideas has transformed into 9 interviews over the course of 2022. It has been a great privilege to speak to so many great researchers this year, here is a quick summary of all the interviews covering everything from NLP, to conservation, to swarm robotics. The series began with David Adelani, talking about his work on NLP for low resource languages. In the second episode Isabel Cachola talked about how she got into AI and her work on interpretability of NLP models.